An Evaluation of Intrinsic Dimensionality Estimators

نویسندگان

  • Peter J. Verveer
  • Robert P. W. Duin
چکیده

81 only holds if we consider the whole set &. If more information about the curves are given, e.g. if fiducial points are given, then it might be possible to construct invariants which are non-constant and continu-Thus the euclidean nature of image distorsion and the projective nature of camera geometry do not interact well. It is possible that one could construct projective invariants which are continuous with respect to some other metric, but would this metric be relevant? ous. ACKNOWLEDGEMENTS I would like to thank my supervisor Gunnar Sparr for inspiration and guidance. I would also like to thank my fellow students Anders Heyden and Carl-Gustav Werner for their help. Abstract-The intrinsic dimensionality of a data set may be useful for understanding the properties of classifiers applied to it and thereby for the selection of an optimal classifier. In this paper we compare the algorithms for two estimators of the intrinsic dimensionality of a given data set and extend their capabilities. One algorithm is based on the local eigenvalues of the covariance matrix in several small regions in the feature space. The other estimates the intrinsic dimensionality from the distribution of the distances from an arbitrary data vector to a selection of its neighbors. The characteristics of the two estimators are investigated and the results are compared. It is found that both can be applied successfully, but that they might fail in certain cases. The estimators are compared and illustrated using data generated from chromosome banding profiles.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fractal-Based Methods as a Technique for Estimating the Intrinsic Dimensionality of High-Dimensional Data: A Survey

The estimation of intrinsic dimensionality of high-dimensional data still remains a challenging issue. Various approaches to interpret and estimate the intrinsic dimensionality are developed. Referring to the following two classifications of estimators of the intrinsic dimensionality – local/global estimators and projection techniques/geometric approaches – we focus on the fractalbased methods ...

متن کامل

Multiscale Estimation of Intrinsic Dimensionality of Data Sets

We present a novel approach for estimating the intrinsic dimensionality of certain point clouds: we assume that the points are sampled from a manifold M of dimension k, with k << D, and corrupted by D-dimensional noise. When M is linear, one may analyze this situation by SVD: with no noise one would obtain a rank k matrix, and noise may be treated as a perturbation of the covariance matrix. Whe...

متن کامل

Intrinsic Dimensionality Estimation in Visualizing Toxicity Data

Over the years, a number of dimensionality reduction techniques have been proposed and used in chemo informatics to perform nonlinear mappings. Nevertheless, data visualization techniques can be efficiently applied for dimensionality reduction mainly in a case if the data are not really high-dimensional and can be represented as a nonlinear low-dimensional manifold when it is possible to reduce...

متن کامل

An Introduction to Dimensionality Reduction Using Matlab

Dimensionality reduction is an important task in machine learning, for it facilitates classification, compression, and visualization of high-dimensional data by mitigating undesired properties of high-dimensional spaces. Over the last decade, a large number of new (nonlinear) techniques for dimensionality reduction have been proposed. Most of these techniques are based on the intuition that dat...

متن کامل

Enhanced Estimation of Local Intrinsic Dimensionality Using Auxiliary Distances

Estimating Intrinsic Dimensionality (ID) is of high interest in many machine learning tasks, including dimensionality reduction, outlier detection, similarity search and subspace clustering. Our proposed estimation strategy, ALID, makes use of a subset of the available intra-neighborhood distances to achieve faster convergence with fewer samples, and can thus be used on applications in which th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Pattern Anal. Mach. Intell.

دوره 17  شماره 

صفحات  -

تاریخ انتشار 1995